A lognormal tied mixture model of pitch for prosody based speaker recognition

نویسندگان

M. Kemal Sönmez

Larry P. Heck

Mitch Weintraub

Elizabeth Shriberg

چکیده

Statistics of pitch have recently been used in speaker recognition systems with good results. The success of such systems depends on robust and accurate computation of pitch statistics in the presence of pitch tracking errors. In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving. We first argue by a simple correlation model and empirically demonstrate by QQ plots that “clean” pitch is distributed with a lognormal distribution rather than the often assumed normal distribution. Second, we present a probabilistic model for estimated pitch via a pitch tracker in the presence of doubling/halving, which leads to a mixture of three lognormal distributions with tied means and variances for a total of four free parameters. We use the obtained pitch statistics as features in speaker verification on the March 1996 NIST Speaker Recognition Evaluation data (subset of Switchboard) and report results on the most difficult portion of the database: the “one-session” condition with males only for both the claimant and imposter speakers. Pitch statistics provide 22% reduction in false alarm rate at 1% miss rate and 11% reduction in false alarm rate at 10% miss rate over the cepstrum-only system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-pitch Detection Algorithm Using Constrained Gaussian Mixture Model and Information Criterion for Simultaneous Speech

In this paper, a co-channel multi-pitch detection algorithm is described. We suggest the importance of this when prosodic information is need to be extracted separately from respective F0 patterns of concurrent utterances. Though temporal continuity of speech prosody should be considered, we discuss a process done independently on each single frame as the first step. A model of multiple harmoni...

متن کامل

Incorporating Prosodic with Acoustic information for ISCSLP’2006 Speaker Recognition Evaluation- Robust Cross-Channel Speaker Verification

In this paper, we present our speaker verification (SV) systems for the cross-channel text-independent and dependent speaker verification (TI-SV and TD-SV) tasks of ISCSLP’2006 speaker recognition evaluation (ISCSLP2006-SRE). To address the cross-channel issues and take advantage of the unique characteristics of Mandarin (i.e., tonal language), prosodic contours are modeled to assist the state-...

متن کامل

Modeling dynamic prosodic variation for speaker verification

Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker’s distribution of f0 values, such statistics fail to capture information about local dynamics in intonation that characterize an individual’s speaking style. In this work, we take a first step toward capturing such ...

متن کامل

Text-independent Speaker Identification Based on MAP Channel Compensation and Pitch-dependent Features

One major source of performance decline in speaker recognition system is channel mismatch between training and testing. This paper focuses on improving channel robustness of speaker recognition system in two aspects of channel compensation technique and channel robust features. The system is text-independent speaker identification system based on two-stage recognition. In the aspect of channel ...

متن کامل

Improving automatic emotion recognition from speech signals

We present a speech signal driven emotion recognition system. Our system is trained and tested with the INTERSPEECH 2009 Emotion Challenge corpus, which includes spontaneous and emotionally rich recordings. The challenge includes classifier and feature sub-challenges with five-class and two-class classification problems. We investigate prosody related, spectral and HMM-based features for the ev...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

A lognormal tied mixture model of pitch for prosody based speaker recognition

نویسندگان

چکیده

منابع مشابه

Multi-pitch Detection Algorithm Using Constrained Gaussian Mixture Model and Information Criterion for Simultaneous Speech

Incorporating Prosodic with Acoustic information for ISCSLP’2006 Speaker Recognition Evaluation- Robust Cross-Channel Speaker Verification

Modeling dynamic prosodic variation for speaker verification

Text-independent Speaker Identification Based on MAP Channel Compensation and Pitch-dependent Features

Improving automatic emotion recognition from speech signals

عنوان ژورنال:

اشتراک گذاری